An Efficient Uniform-Cost Normalized Edit Distance Algorithm

نویسندگان

  • Abdullah N. Arslan
  • Ömer Egecioglu
چکیده

A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn2) time in the worst-case. We give anO(mn logn)-time algorithm for the problem when the cost function is uniform, i.e, the weight of each edit operation is constant within the same type, except substitutions can have different weights depending on whether they are matching or non-matching.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Eecient Uniform-cost Normalized Edit Distance Algorithm

A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m n, is to transform X into Y through a sequence of edit operations which are of three types: insertion, deletion, and substitution of symbols. The model assumes a given weight function which assigns a non-negative real cost to each of these edit operations. The amortized weight for a given ...

متن کامل

Efficient Algorithms For Normalized Edit Distance

ÖMER EGECIOGLU2, Department of Computer Science, University of California, Santa Barbara, CA 93106, USA. E-mail: [email protected] ABSTRACT: A common model for computing the similarity of two stringsX and Y of lengthsm and n respectively, withm n, is to transformX into Y through a sequence of edit operations, called an edit sequence. The edit operations are of three types: insertion, deletion, a...

متن کامل

Computation of Normalized Edit Distance and Applications

Given two strings X and Y over a finite alphabet, the normalized edit distance between X and Y, d( X , Y ) is defined as the minimum of W ( P ) / L ( P ) , where P is an editing path between X and Y , W ( P ) is the sum of the weights of the elementary edit operations of P, and L ( P ) is the number of these operations (length of P). In this paper, it is shown that in general, d ( X , Y ) canno...

متن کامل

Parallel algorithms for fast computation of normalized edit distances

We give work-optimal and polylogarithmic time parallel algorithms for solving the normalized edit distance problem. The normalized edit distance between two strings X and Y with lengths n m is the minimum quotient of the sum of the costs of edit operations transforming X into Y by the length of the edit path corresponding to those edit operations. Marzal and Vidal proposed a sequential algorith...

متن کامل

Faster algorithms for guided tree edit distance

The guided tree edit distance problem is to find a minimum cost series of edit operations that transforms two input forests F and G into isomorphic forests F ′ and G′ such that a third input forest H is included in F ′ (and G′). The edit operations are relabeling a vertex and deleting a vertex. We show efficient algorithms for this problem that are faster than the previous algorithm for this pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999